InĀ [1]:
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np 

This exercise consists of 3 parts. Finish the first part to get the mark of 3.0. The first 2 parts to get 4.0. Finish all parts to get 5.0.

Part 1: Linear layer¶

1.1) Let us start with a linear regression problem. Consider a linear function with a noise: $y = a*x+b + noise$.

We use this formula to generate $100$ random smaples.

InĀ [2]:
### The number of samples
n = 100 
### parameters of the linear function
a = -2 
b = 3

1.2) Now, let us generate 100 samples and plot them.

InĀ [3]:
### generate equally spaced x-values
x = np.linspace(-1, 1, n) 
### generate y-values (we use a numpy library so we can generate a vector of numbers - y - inline)
y = a * x + b + np.random.normal(scale=0.25, size=n)

plt.scatter(x, y)
Out[3]:
<matplotlib.collections.PathCollection at 0x1102e74c0>
No description has been provided for this image

1.3) As you may see, the samples are placed - more or less - along a single line. Now, our aim is to find the best parameters for a linear function so that such defined model describes the given data in the best possible way. For this reason, we will iteratively search the parameter space and thus update the model. Firstly, we need to define an error function. This function will inform how well (or bad) the instantiated model describes the data. For this reason, we use a mean square error function.

We define a mean square error function as:
$\dfrac{\sum\left(y_i - \widehat{y}_i \right)^2}{n} = MSE,$

where $y$ are the target (i.e., data values) and $\widehat{y}$ are the output (i.e., model's) values.

See the MSE (mean square error) function given below.

InĀ [4]:
def mse(y_target, y_calc):
    return ((y_target - y_calc) ** 2).mean()

1.4) Run the below code for different parameters of the model. Which paramter values give the best (i.e., minimal) MSE?

Answer: Best (lowest) MSE was achieved for parameters a = -2 and b = 3 (original parameters)

InĀ [7]:
a_2 = -2
b_2 = 3

y_calc = a_2 * x + b_2
print("MSE  =  " + str(mse(y, y_calc)))

plt.scatter(x, y, label="target")
plt.scatter(x, y_calc, label="calculated")
plt.legend()
MSE  =  0.0656690568744481
Out[7]:
<matplotlib.legend.Legend at 0x1104f7a60>
No description has been provided for this image

1.5) We want to find the best possible model parameters automatically. For this reason, we use a gradient of a loss function. The gradient informs what is the direction of the fastest increase/decrease of a given function. We use this information to update both model parameters. This procedure will be performed iterativelly. In each iteration, the parameters a and b will be slightly modififed such that MSE will be reduced (i.e., improved).

Firstly, finish the below function. It should calculate a batch gradient of a loss function, i.e., MSE for each point separately (y_target, and y_calc are array, not just scalars, so output also should be array).

InĀ [10]:
def mse_grad(y_target, y_calc):
    return y_calc - y_target


### TEST
print(mse_grad(y, y_calc))
[ 0.16435044  0.10530027  0.35953788  0.09790265 -0.20403568 -0.25837372
  0.05689676 -0.29803973 -0.53201142 -0.0432235   0.33343447  0.10328341
 -0.60409687  0.02027015  0.4736115  -0.21370767  0.21591511  0.40266068
  0.56601944 -0.05137797  0.18767104  0.06748695 -0.1879076  -0.20617886
 -0.15836553 -0.16336731  0.04617821 -0.00684341 -0.12529731  0.12018349
 -0.38429573  0.15270568  0.19339822  0.00229354 -0.25456968 -0.12439183
 -0.12556509 -0.15535872 -0.07311001 -0.15435593  0.32578137  0.26337263
  0.07163233  0.25972974 -0.09615268 -0.03085522  0.01820603  0.10348768
  0.09528317 -0.0145715  -0.02283473  0.02136038 -0.13141972 -0.00436607
  0.0168016  -0.34427812 -0.06435793 -0.23591849 -0.34047201  0.40400917
  0.35736458  0.02164308  0.08913521 -0.12835901 -0.17615402 -0.22754227
 -0.66515678 -0.15496322  0.20720018 -0.49948568 -0.27297432 -0.23819038
 -0.10911635 -0.05813818 -0.15376758 -0.13818918  0.00649086  0.58984702
 -0.22837654 -0.54988911  0.08655574  0.22054453  0.08136499 -0.44276635
 -0.28361487 -0.28359411 -0.51896772  0.1044107  -0.09017026  0.01769262
 -0.21311841 -0.491339    0.30520476  0.26076527  0.15847168  0.26379813
 -0.24886087  0.30243846  0.10489043 -0.01035905]

1.6) Fill the update function to calculate gradient of parameter $a$ and $b$ basing on a gradient of loss function (grad_y) and input vector (x). Then update the parameter $a$ and $b$ base on their gradients and learning rate (lr). To update parameters use batch gradient descent.

InĀ [27]:
class LinearLayer:
    def __init__(self, a, b):
        self.a = a
        self.b = b

    def __call__(self, x):
        return self.a * x + self.b

    def update(self, x, grad_y, lr):
        grad_a = (grad_y * x).mean()
        grad_b = grad_y.mean()

        self.a -= lr * grad_a
        self.b -= lr * grad_b

1.7) Write Step function which calculates: y_calc output of the model base on input x, loss of the model, gradient of loss, and update the model parameters.

InĀ [28]:
def Step(x, y, model, lr):
    y_calc = model(x)
    loss = mse(y, y_calc)
    grad_y = mse_grad(y, y_calc)
    model.update(x, grad_y, lr)
    return y_calc, loss

1.8) Fit the model for 100 epochs, with learning rate 0.05, and with initial value of parameters a = 1.1, and b = 2.

InĀ [29]:
model = LinearLayer(1.1, 2)
InĀ [30]:
lr = 0.05
InĀ [31]:
epoch = 200
losses = []
for i in range(epoch):
    y_calc, loss = Step(x, y, model, lr)
    losses.append(loss)
InĀ [32]:
plt.plot(losses)
Out[32]:
[<matplotlib.lines.Line2D at 0x14fabeaf0>]
No description has been provided for this image

Animation of the learning process

InĀ [33]:
from matplotlib import animation, rc
rc('animation', html='jshtml')
InĀ [34]:
model = LinearLayer(1.1, 2)
InĀ [35]:
fig = plt.figure()
plt.scatter(x, y)
line, = plt.plot(x, y_calc, ".", c="orange")
plt.close()


def animate(i):
    y_calc, loss = Step(x, y, model, lr)
    line.set_ydata(y_calc)
    return (line,)


animation.FuncAnimation(fig, animate, np.arange(0, epoch), interval=20)
Out[35]:
No description has been provided for this image

1.9) There is an example it can be done in pytorch.

InĀ [20]:
# Imports
import torch
import torch.nn as nn
/Users/Kuba/Library/Python/3.9/lib/python/site-packages/torch/utils/_pytree.py:185: FutureWarning: optree is installed but the version is too old to support PyTorch Dynamo in C++ pytree. C++ pytree support is disabled. Please consider upgrading optree using `python3 -m pip install --upgrade 'optree>=0.13.0'`.
  warnings.warn(
InĀ [43]:
# Convert numpy array to torch tensor, [:,None] add an additional dimension
xt = torch.FloatTensor(x[:, None])
yt = torch.FloatTensor(y[:, None])
InĀ [44]:
def mse(y_target, y_calc):
    return ((y_target - y_calc) ** 2).mean()
InĀ [45]:
class LinearLayer(nn.Module):
    def __init__(self, a, b):
        super(LinearLayer, self).__init__()  # initialize torch functionality
        # change a and b to float tensor, and next to parameters,
        # the main difference between tensor and parameter is that parameter keeps information about calculations,
        # which is used to calculate gradients
        self.a = nn.Parameter(torch.FloatTensor([a]).view(1, 1))
        self.b = nn.Parameter(torch.FloatTensor([b]))

    # forward function is similar to python __call__ but also contain torch functionality
    def forward(self, x):
        return  x @ self.a + self.b  # linear equation, @ means matrix multiplication for tensor

    def update(self, lr):
        with torch.no_grad():  # when we update parameter, we have to switch off gradient tracking
            self.a.sub_(lr * self.a.grad)  # inplace update of parameter a
            self.a.grad.zero_()  # clear gradient

            self.b.sub_(lr * self.b.grad)
            self.b.grad.zero_()
InĀ [46]:
model =  LinearLayer(-1.1, 0.2)
InĀ [47]:
def torchStep(x, y, model, lr):
    y_calc = model(x)  # calculate the output of our model
    loss = mse(y, y_calc)  # calculate the loss
    loss.backward()  # calculate all gradients
    model.update(lr)  # update parameters
    return loss, y_calc
InĀ [48]:
loss, y_calc = torchStep(xt, yt, model, lr)
y_calc = y_calc.detach().cpu()
fig = plt.figure()
plt.scatter(xt[:, 0], yt)
line, = plt.plot(xt[:, 0], y_calc, c="orange")
plt.close()


def animate(i):
    loss, y_calc = torchStep(xt, yt, model, lr)
    y_calc = y_calc.detach().cpu()  #
    line.set_ydata(y_calc)
    return (line,)


animation.FuncAnimation(fig, animate, np.arange(0, 100), interval=20)
Out[48]:
No description has been provided for this image
InĀ [49]:
# we can use optimizer to update parameters base on their gradients
# the most simple is stochastic gradient descent (SGD)
def torchStep2(x, y, model, optim):
    optim.zero_grad()  # clear gradients
    y_calc = model(x)  # calculate output of model
    loss = mse(y, y_calc)  # calculate loss
    loss.backward()  # calculate all gradients
    optim.step()  # make a optymalizer step which update parameters
    return loss, y_calc
InĀ [50]:
model = LinearLayer(-1.1, 0.2)
optim = torch.optim.SGD(model.parameters(), lr)
InĀ [51]:
loss, y_calc = torchStep2(xt, yt, model, optim)
y_calc = y_calc.detach().cpu()
fig = plt.figure()
plt.scatter(xt[:, 0], yt)
line, = plt.plot(xt[:, 0], y_calc, c="orange")
plt.close()


def animate(i):
    loss, y_calc = torchStep2(xt, yt, model, optim)
    y_calc = y_calc.detach().cpu()
    line.set_ydata(y_calc)
    return (line,)


animation.FuncAnimation(fig, animate, np.arange(0, 100), interval=20)
Out[51]:
No description has been provided for this image

Part 2: Convolution layer¶

InĀ [52]:
# input image
image = np.array(
    [
        [0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0],
        [0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0],
        [0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0],
        [0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0],
        [0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0],
        [0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0],
        [1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
        [0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
        [0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
        [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
        [0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1],
        [0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0],
        [0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0],
        [0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0],
        [0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0],
    ]
)
InĀ [53]:
plt.imshow(image)
Out[53]:
<matplotlib.image.AxesImage at 0x14fc3a190>
No description has been provided for this image

2.1) Write a function which calculates a convolution on an input matrix (image) using kernel (mask) with shape 3x3 and bias. Do not use padding, so the output image should be in size: (input_width -2) x (input_height -2).

InĀ [67]:
from scipy.signal import convolve2d

def Convolution(image, kernel, bias):
    img_out = np.zeros((image.shape[0] - 2, image.shape[1] - 2))
    for i in range(image.shape[0]-2):
        for j in range(image.shape[1]-2):
            subarray = image[i:i+3, j:j+3]
            img_out[i][j] = np.sum(subarray * kernel) + bias
    # img_out = convolve2d(image, kernel, mode='valid')
    return img_out
InĀ [64]:
# kernel (mask) which is mean filter
kernel = np.ones((3, 3)) / 9
kernel
Out[64]:
array([[0.11111111, 0.11111111, 0.11111111],
       [0.11111111, 0.11111111, 0.11111111],
       [0.11111111, 0.11111111, 0.11111111]])
InĀ [65]:
bias = -0.5
InĀ [68]:
img_out = Convolution(image, kernel, bias)
InĀ [69]:
plt.imshow(img_out)
Out[69]:
<matplotlib.image.AxesImage at 0x31b7a7310>
No description has been provided for this image

2.2) Find out kernels (masks) which found horizontal and vertical lines. Pixels belonging to the line should be greater than zero and the others less than zero. Use size 3x3 masks.

Example
print(Convolution(np.array([[0,0,0,0,0],[0,0,0,0,0],[1,1,1,1,1],[0,0,0,0,0],[0,0,0,0,0]]), kernel_horizontal, -2))
[[-1. -1. -1.]
[ 1. 1. 1.]
[-1. -1. -1.]]

InĀ [71]:
kernel_horizontal =np.array([[1, 1, 1], [0, 0, 0], [-1, -1, -1]])
InĀ [72]:
img_horizontal = Convolution(image, kernel_horizontal, -2)
plt.imshow(img_horizontal)
Out[72]:
<matplotlib.image.AxesImage at 0x31bf24e20>
No description has been provided for this image
InĀ [76]:
kernel_vertical = kernel_horizontal.T
InĀ [77]:
img_vertical = Convolution(image, kernel_vertical, -2)
plt.imshow(img_vertical)
Out[77]:
<matplotlib.image.AxesImage at 0x31c011280>
No description has been provided for this image

2.3) Complete function to calculate ReLU.

InĀ [82]:
def relu(x):
    return np.maximum(0, x)

2.4) Find bias values such that output images pixels have a value above 0 only if original pixel is a part of the horizontal/vertical line.

InĀ [83]:
plt.imshow(relu(img_horizontal))
plt.show()
plt.imshow(relu(img_vertical))
No description has been provided for this image
Out[83]:
<matplotlib.image.AxesImage at 0x31c3b4310>
No description has been provided for this image

Part 3: Deep network¶

InĀ [84]:
import pandas as pd
InĀ [86]:
# load iris dataset
df = pd.read_csv('iris.csv')
InĀ [87]:
# n - number of elements in dataset
n = len(df)
InĀ [88]:
# useful variables
feature_columns = ["sepal.length", "sepal.width", "petal.length", "petal.width"]
target_column = "variety"
class_number = 3
feature_number = 4
InĀ [89]:
# dictionaries use to map class name to number
name_to_class = {0: "Setosa", 1: "Versicolor", 2: "Virginica"}
class_to_name = {"Setosa": 0, "Versicolor": 1, "Virginica": 2}
InĀ [90]:
# conversion of class name
df[target_column] = df[target_column].apply(lambda x: class_to_name[x])
InĀ [97]:
# take raw numpy data
x = df[feature_columns].values
y = df[target_column].values
InĀ [98]:
# normalize data to make network input mean value equals 0 and standard deviation 1
x = (x - x.mean(0)) / x.std(0)
print(x.mean(0))
print(x.std(0))
[-4.73695157e-16 -7.81597009e-16 -4.26325641e-16 -4.73695157e-16]
[1. 1. 1. 1.]
InĀ [99]:
# conversion numpy array to torch tensor
x = torch.FloatTensor(x)
y = torch.LongTensor(y)
InĀ [100]:
# simple neural network with one hidden layer with hidden_nr neuron
# input_layer calculate some features  which are used by hidden_layer to calculate prediction
# between input_layer and hidden_layer there is relu  as a nonlinear activation function
# after hidden_layer there is sigmoid function because we want the network to return the result as a probability of each class in range [0,1]
class Net(nn.Module):
    def __init__(self, input_nr, hidden_nr, output_nr):
        super(Net, self).__init__()
        self.input_layer = nn.Linear(input_nr, hidden_nr)
        self.hidden_layer = nn.Linear(hidden_nr, output_nr)

    def forward(self, x):
        x = self.input_layer(x)
        x = torch.relu(x)
        x = self.hidden_layer(x)
        return torch.sigmoid(x)

Cross entropy loss is equal $- (y=0) * log(p_0) - (y=1) * log(p_1) - (y=2) * log(p_2)$ where $p_1, p_2,p_3$ are calculated probability of class 1,2,3; and y=0 means y is classified to class 0.

InĀ [101]:
loss_func = nn.CrossEntropyLoss()
InĀ [102]:
# accuracy means how many samples are classified correctly
def Accuracy(y_target, y_calc):
    prediction_class = y_calc.max(1)[1]
    number_of_correct = (prediction_class == y).float().sum()
    return number_of_correct / n
InĀ [103]:
def Step(x, y, model, optim):
    optim.zero_grad()
    y_calc = model(x)
    loss = loss_func(y_calc, y)
    loss.backward()
    optim.step()
    acc = Accuracy(y, y_calc)
    return loss, y_calc, acc
InĀ [115]:
# Train function train model for epoch step, and collect metrics (loss and accuracy)
def Train(x, y, model, optim, epoch):
    losses = []
    accuracies = []
    for i in range(epoch):
        loss, y_calc, acc = Step(x, y, model, optim)
        losses.append(loss.detach().numpy())
        accuracies.append(acc)
    return losses, accuracies
InĀ [105]:
lr = 0.1
InĀ [116]:
# create a model and optimalizer
hidden_nr = 5
model = Net(feature_number, hidden_nr, class_number)
optim = torch.optim.SGD(model.parameters(), lr)
InĀ [117]:
epoch = 200
losses, accuracies = Train(x, y, model, optim, epoch)
InĀ [119]:
plt.plot(losses)
plt.show()
plt.plot(accuracies) 
No description has been provided for this image
Out[119]:
[<matplotlib.lines.Line2D at 0x34af1d220>]
No description has been provided for this image

Part 3:¶

3.1) Create a report of testing different values of learning rate, and number of neurons in hidden layer; Run every test 10 times with 200 epochs. Make a plot of mean of losses and accuracy of each value in the test case.

test case 1: 
learning rate:[ 1, 0.5, 0.1, 0.01, 0.001]
number of neuron in hidden layer: 10

test case 2: 
number of neuron in hidden layer: [1, 2, 5, 10, 20, 100]
learning rate: 0.1

Testing different learning rates¶

InĀ [146]:
hidden_nr = 10
losses_arr = []
accuracies_arr = []
for lr in [1, 0.5, 0.1, 0.01, 0.001]:
    ls = []
    acc = []
    for _ in range(10):
        model = Net(feature_number, hidden_nr, class_number)
        optim = torch.optim.SGD(model.parameters(), lr)
        losses, accuracies = Train(x, y, model, optim, 200)
        ls.append(losses[-1])
        acc.append(accuracies[-1])
    losses_arr.append(sum(ls)/10)
    accuracies_arr.append(sum(acc)/10)
InĀ [147]:
plt.plot([0.001, 0.01, 0.1, 0.5, 1], losses_arr[::-1], '-o')
plt.xscale('log')
plt.xlabel('Value of learning rate')
plt.ylabel('Average loss')
plt.title('Loss depending on value of learning rate')
plt.show()
No description has been provided for this image
InĀ [148]:
plt.plot([0.001, 0.01, 0.1, 0.5, 1], accuracies_arr[::-1], '-o')
plt.xscale('log')
plt.xlabel('Value of learning rate')
plt.ylabel('Average accuracy')
plt.title('Accuracy depending on value of learning rate')
plt.show()
No description has been provided for this image

Testing different number of neurons¶

InĀ [149]:
lr = 0.1
losses_arr = []
accuracies_arr = []
for nr_of_neurons in [1, 2, 5, 10, 20, 100]:
    ls = []
    acc = []
    for _ in range(10):
        model = Net(feature_number, nr_of_neurons, class_number)
        optim = torch.optim.SGD(model.parameters(), lr)
        losses, accuracies = Train(x, y, model, optim, 200)
        ls.append(losses[-1])
        acc.append(accuracies[-1])
    losses_arr.append(sum(ls)/10)
    accuracies_arr.append(sum(acc)/10)
InĀ [150]:
plt.plot([1, 2, 5, 10, 20, 100], losses_arr, '-o')
plt.xscale('log')
plt.xlabel('Number of neurons in hidden layer')
plt.ylabel('Average loss')
plt.title('Loss depending on number of neurons in hidden layer')
plt.show()
No description has been provided for this image
InĀ [151]:
plt.plot([1, 2, 5, 10, 20, 100], accuracies_arr, '-o')
plt.xscale('log')
plt.xlabel('Number of neurons in hidden layer')
plt.ylabel('Average accuracy')
plt.title('Accuracy depending on number of neurons in hidden layer')
plt.show()
No description has been provided for this image